Quasi-metrics, Similarities and Searches: aspects of geometry of protein datasets

نویسنده

  • Aleksandar Stojmirovic
چکیده

A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. Quasi-metrics were first introduced in 1930s and are a subject of intensive research in the context of topology and theoretical computer science. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. As sequence similarity search is one of the most important techniques of modern bioinformatics, this motivates a new direction of research: development of geometric aspects of the theory of quasi-metric spaces and its applications to similarity search in general and large protein datasets in particular. The thesis starts by presenting basic concepts of the theory of quasi-metric spaces illustrated by numerous examples, some previously known, some novel. In particular, the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space are constructed. Sets of biological sequences with some commonly used similarity measures provide a further and the most important example. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The concept of a pq-space is a generalisation of a notion of an mm-space from the asymptotic geometric analysis: an mm-space is a metric space with Borel measure that provides the framework for study of the phenomenon of concentration of measure on high dimensional structures. While some concepts and results are direct extensions of results about mm-spaces, some are intrinsic to the quasi-metric case. One of the main results of this chapter indicates that ‘a high dimensional quasi-metric space is close to being a metric space’. Chapter 5 investigates the geometric aspects of the theory of database similarity search. It extends the existing concepts of a workload and an indexing scheme in order to cover more general cases and introduces the concept of a quasi-metric tree as an analogue to a metric tree, a popular class of access methods for metric datasets. The results about pq-spaces are used to produce some new theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. The performance of FSIndex turns out to be very good in comparison with existing access methods. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Warped product and quasi-Einstein metrics

Warped products provide a rich class of physically significant geometric objects. Warped product construction is an important method to produce a new metric with a base manifold and a fibre. We construct compact base manifolds with a positive scalar curvature which do not admit any non-trivial quasi-Einstein warped product, and non compact complete base manifolds which do not admit any non-triv...

متن کامل

Solution of Vacuum Field Equation Based on Physics Metrics in Finsler Geometry and Kretschmann Scalar

The Lemaître-Tolman-Bondi (LTB) model represents an inhomogeneous spherically symmetric universefilledwithfreelyfallingdustlikematterwithoutpressure. First,wehaveconsideredaFinslerian anstaz of (LTB) and have found a Finslerian exact solution of vacuum field equation. We have obtained the R(t,r) and S(t,r) with considering establish a new solution of Rµν = 0. Moreover, we attempttouseFinslergeo...

متن کامل

On quasi-Einstein Finsler spaces‎

‎The notion of quasi-Einstein metric in physics is equivalent to the notion of Ricci soliton in Riemannian spaces‎. ‎Quasi-Einstein metrics serve also as solution to the Ricci flow equation‎. ‎Here‎, ‎the Riemannian metric is replaced by a Hessian matrix derived from a Finsler structure and a quasi-Einstein Finsler metric is defined‎. ‎In compact case‎, ‎it is proved that the quasi-Einstein met...

متن کامل

Indexing Schemes for Similarity Search: an Illustrated Paradigm

We suggest a variation of the Hellerstein— Koutsoupias—Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used to index into metric spaces and facil...

متن کامل

A Geometry Preserving Kernel over Riemannian Manifolds

Abstract- Kernel trick and projection to tangent spaces are two choices for linearizing the data points lying on Riemannian manifolds. These approaches are used to provide the prerequisites for applying standard machine learning methods on Riemannian manifolds. Classical kernels implicitly project data to high dimensional feature space without considering the intrinsic geometry of data points. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0810.5407  شماره 

صفحات  -

تاریخ انتشار 2005